We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence. These attributes are of interest because we do not have a concrete visual definition of what they entail. What does it look like for a dog to be more or less memorable? GANs allow us to generate a manifold of natural-looking images with fine-grained differences in their visual attributes. By navigating this manifold in directions that increase memorability, we can visualize what it looks like for a particular generated image to become more or less memorable. The resulting "visual definitions" surface image properties (like "object size") that may underlie memorability. Through behavioral experiments, we verify that our method indeed discovers image manipulations that causally affect human memory performance. We further demonstrate that the same framework can be used to analyze image aesthetics and emotional valence. Visit the GANalyze website at http://ganalyze.csail.mit.edu/.
translated by 谷歌翻译
Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets -Something-Something, Jester, and Charades -which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos 1 .
translated by 谷歌翻译
We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.
translated by 谷歌翻译
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them.
translated by 谷歌翻译
对于机器人来说,在人口稠密地区的自主航行仍然是一项艰巨的任务,因为难以确保在非结构化情况下与行人进行安全互动。在这项工作中,我们提出了一个人群导航控制框架,该框架可在自动驾驶汽车上提供连续避免障碍物和接触后控制。我们建议评估指标,以了解自然人群中的会计效率,控制器响应和人群相互作用。我们报告了不同人群类型的110多种试验的结果:稀疏,流量和混合流量,低 - (<0.15 ppsm),中部(<0.65 ppsm)和高 - (<1 ppsm)的行人密度。我们提出了两种低级避免障碍方法与共享控制基线之间的比较结果。结果表明,在最高密度测试上,相对时间下降了10%,没有其他效率度量降低。此外,自主导航显示与共享控制导航相当,相对混蛋较低,命令的流利度明显更高,表明与人群的兼容性很高。我们得出的结论是,反应性控制器履行了对人群导航的快速和连续适应的必要任务,并且应该与高级计划者一起以进行环境和情境意识。
translated by 谷歌翻译
移动操纵器投掷是一种有前途的方法,可以提高工厂动态操纵的灵活性和效率。其主要挑战是在一系列任务规格下有效地计划可行的投掷。我们分析了投掷问题,并表明可以将其简化为更简单的平面问题,从而大大降低了计算成本。使用数据分析和机器学习,我们构建了对象的倒飞行动力学和机器人的运动可行性的模型,该模型可以在给定目标位置查询的1 ms中投掷运动。由于我们方法的计算效率,我们表明,在执行任务执行期间受到干扰时,系统是自适应的,是通过即时进行重新启动以找出替代投掷而不是坚持原始计划。代码可在以下网址找到:https://github.com/liuuyangdh/mobile-throwing
translated by 谷歌翻译
近年来,深度学习(DL)方法的流行程度急剧增加,并且在生物医学科学中的监督学习问题中的应用显着增长。但是,现代生物医学数据集中缺失数据的较高流行率和复杂性对DL方法提出了重大挑战。在这里,我们在深入学习的广义线性模型的背景下,对缺失数据进行了正式处理,这是一种监督的DL架构,用于回归和分类问题。我们提出了一种新的体系结构,即\ textit {dlglm},这是第一个能够在训练时在输入功能和响应中灵活地说明忽略和不可忽视的缺失模式之一。我们通过统计模拟证明,我们的方法在没有随机(MNAR)缺失的情况下胜过现有的监督学习任务方法。我们从UCI机器学习存储库中对银行营销数据集进行了案例研究,在该数据集中我们预测客户是否基于电话调查数据订阅了产品。
translated by 谷歌翻译
本文提出了一种新的方法,以学习由动态系统驱动的稳定机器人控制法。该方法需要单个演示,并可以在任意高维度中推断出稳定的动力学。该方法依赖于存在一个潜在空间的想法,非线性动力学出现准线性。原始的非线性动力学通过利用图形嵌入的属性来映射到稳定的线性DS中。我们表明,图laplacian的特征分类导致在二维中的线性嵌入,并在较高维度中进行准线性。非线性术语消失,随着数据点数的增加而呈指数呈指数化,并且对于较大的点密度,嵌入似乎是线性的。我们表明,这种新的嵌入能够在高维度上建模高度非线性动力学,并以重建精度和嵌入所需的参数数量克服替代技术。我们证明了它的适用性,以控制负责在空间中执行复杂自由运动的实际机器人。
translated by 谷歌翻译
现代的高通量单细胞免疫分析技术,例如流量,质量细胞术和单细胞RNA测序,可以轻松地测量多种患者队列中数百万个细胞中大量蛋白质或基因特征的表达。虽然生物信息学方法可用于将免疫细胞异质性与感兴趣的外部变量(例如临床结果或实验标签)联系起来,但它们通常很难适应如此大量的概要细胞。为了减轻这种计算负担,通常有限的单元格是\ emph {sherped}或从每个患者中进行了采样。但是,现有的草图方法无法从稀有细胞群中充分分类稀有细胞,或者无法保留特定免疫细胞类型的真实频率。在这里,我们提出了一种基于内核牛群的新颖素描方法,该方法选择了所有细胞的有限子样本,同时保留了免疫细胞类型的潜在频率。我们在三个流量和质量细胞仪数据集以及一个单细胞RNA测序数据集上测试了方法,并证明了素描的单元格(1)更准确地表示整体蜂窝景观,(2)促进下游分析任务的性能提高,例如根据患者的临床结果对患者进行分类。 \ url {https://github.com/vishalathreya/set-summarization}公开获得用内核放牧的素描实现。
translated by 谷歌翻译
由于货运车数量的增加,在城市地区采用了电动汽车(EV),以减少环境污染和全球变暖。但是,路由最后一英里物流的轨迹仍在继续影响社会和经济可持续性时仍然存在缺陷。因此,在本文中,提出了一种称为超高神性自适应模拟退火的超增压性(HH)方法,并提出了增强学习(HHASA $ _ {RL} $)。它由多军匪徒方法和自适应模拟退火(SA)元启示术算法组成,用于解决该问题称为电容的电动汽车路由问题(CEVRP)。由于充电站数量有限和电动汽车的旅行范围,因此电动汽车必须提前为电池充电时刻,并减少旅行时间和成本。 HH实施的HH改善了多个最低最低知名解决方案,并为IEEE WCCI2020竞赛的拟议基准测试获得了一些高维实例的最佳平均值。
translated by 谷歌翻译